11 research outputs found

    Improving Abstraction in Text Summarization

    Full text link
    Abstractive text summarization aims to shorten long text documents into a human readable form that contains the most important facts from the original document. However, the level of actual abstraction as measured by novel phrases that do not appear in the source document remains low in existing approaches. We propose two techniques to improve the level of abstraction of generated summaries. First, we decompose the decoder into a contextual network that retrieves relevant parts of the source document, and a pretrained language model that incorporates prior knowledge about language generation. Second, we propose a novelty metric that is optimized directly through policy learning to encourage the generation of novel phrases. Our model achieves results comparable to state-of-the-art models, as determined by ROUGE scores and human evaluations, while achieving a significantly higher level of abstraction as measured by n-gram overlap with the source document

    Socratic Pretraining: Question-Driven Pretraining for Controllable Summarization

    Full text link
    In long document controllable summarization, where labeled data is scarce, pretrained models struggle to adapt to the task and effectively respond to user queries. In this paper, we introduce Socratic pretraining, a question-driven, unsupervised pretraining objective specifically designed to improve controllability in summarization tasks. By training a model to generate and answer relevant questions in a given context, Socratic pretraining enables the model to more effectively adhere to user-provided queries and identify relevant content to be summarized. We demonstrate the effectiveness of this approach through extensive experimentation on two summarization domains, short stories and dialogue, and multiple control strategies: keywords, questions, and factoid QA pairs. Our pretraining method relies only on unlabeled documents and a question generation system and outperforms pre-finetuning approaches that use additional supervised data. Furthermore, our results show that Socratic pretraining cuts task-specific labeled data requirements in half, is more faithful to user-provided queries, and achieves state-of-the-art performance on QMSum and SQuALITY.Comment: To appear at ACL 202

    Training Neural Models for Abstractive Text Summarization

    No full text
    Abstractive text summarization aims to condense long textual documents into a short, human-readable form while preserving the most important information from the source document. A common approach to training summarization models is by using maximum likelihood estimation with the teacher forcing strategy. Despite its popularity, this method has been shown to yield models with suboptimal performance at inference time. This work examines how using alternative, task-specific training signals affects the performance of summarization models. Two novel training signals are proposed and evaluated as part of this work. One, a novelty metric, measuring the overlap between n-grams in the summary and the summarized article. The other, utilizing a discriminator model to distinguish human-written summaries from generated ones on a word-level basis. Empirical results show that using the mentioned metrics as rewards for policy gradient training yields significant performance gains measured by ROUGE scores, novelty scores and human evaluation.Abstraktiv textsammanfattning syftar pÄ att korta ner lÄnga textdokument till en förkortad, mÀnskligt lÀsbar form, samtidigt som den viktigaste informationen i kÀlldokumentet bevaras. Ett vanligt tillvÀgagÄngssÀtt för att trÀna sammanfattningsmodeller Àr att anvÀnda maximum likelihood-estimering med teacher-forcing-strategin. Trots dess popularitet har denna metod visat sig ge modeller med suboptimal prestanda vid inferens. I det hÀr arbetet undersöks hur anvÀndningen av alternativa, uppgiftsspecifika trÀningssignaler pÄverkar sammanfattningsmodellens prestanda. TvÄ nya trÀningssignaler föreslÄs och utvÀrderas som en del av detta arbete. Den första, vilket Àr en ny metrik, mÀter överlappningen mellan n-gram i sammanfattningen och den sammanfattade artikeln. Den andra anvÀnder en diskrimineringsmodell för att skilja mÀnskliga skriftliga sammanfattningar frÄn genererade pÄ ordnivÄ. Empiriska resultat visar att anvÀndandet av de nÀmnda mÀtvÀrdena som belöningar för policygradient-trÀning ger betydande prestationsvinster mÀtt med ROUGE-score, novelty score och mÀnsklig utvÀrdering

    Training Neural Models for Abstractive Text Summarization

    No full text
    Abstractive text summarization aims to condense long textual documents into a short, human-readable form while preserving the most important information from the source document. A common approach to training summarization models is by using maximum likelihood estimation with the teacher forcing strategy. Despite its popularity, this method has been shown to yield models with suboptimal performance at inference time. This work examines how using alternative, task-specific training signals affects the performance of summarization models. Two novel training signals are proposed and evaluated as part of this work. One, a novelty metric, measuring the overlap between n-grams in the summary and the summarized article. The other, utilizing a discriminator model to distinguish human-written summaries from generated ones on a word-level basis. Empirical results show that using the mentioned metrics as rewards for policy gradient training yields significant performance gains measured by ROUGE scores, novelty scores and human evaluation.Abstraktiv textsammanfattning syftar pÄ att korta ner lÄnga textdokument till en förkortad, mÀnskligt lÀsbar form, samtidigt som den viktigaste informationen i kÀlldokumentet bevaras. Ett vanligt tillvÀgagÄngssÀtt för att trÀna sammanfattningsmodeller Àr att anvÀnda maximum likelihood-estimering med teacher-forcing-strategin. Trots dess popularitet har denna metod visat sig ge modeller med suboptimal prestanda vid inferens. I det hÀr arbetet undersöks hur anvÀndningen av alternativa, uppgiftsspecifika trÀningssignaler pÄverkar sammanfattningsmodellens prestanda. TvÄ nya trÀningssignaler föreslÄs och utvÀrderas som en del av detta arbete. Den första, vilket Àr en ny metrik, mÀter överlappningen mellan n-gram i sammanfattningen och den sammanfattade artikeln. Den andra anvÀnder en diskrimineringsmodell för att skilja mÀnskliga skriftliga sammanfattningar frÄn genererade pÄ ordnivÄ. Empiriska resultat visar att anvÀndandet av de nÀmnda mÀtvÀrdena som belöningar för policygradient-trÀning ger betydande prestationsvinster mÀtt med ROUGE-score, novelty score och mÀnsklig utvÀrdering

    Training Neural Models for Abstractive Text Summarization

    No full text
    Abstractive text summarization aims to condense long textual documents into a short, human-readable form while preserving the most important information from the source document. A common approach to training summarization models is by using maximum likelihood estimation with the teacher forcing strategy. Despite its popularity, this method has been shown to yield models with suboptimal performance at inference time. This work examines how using alternative, task-specific training signals affects the performance of summarization models. Two novel training signals are proposed and evaluated as part of this work. One, a novelty metric, measuring the overlap between n-grams in the summary and the summarized article. The other, utilizing a discriminator model to distinguish human-written summaries from generated ones on a word-level basis. Empirical results show that using the mentioned metrics as rewards for policy gradient training yields significant performance gains measured by ROUGE scores, novelty scores and human evaluation.Abstraktiv textsammanfattning syftar pÄ att korta ner lÄnga textdokument till en förkortad, mÀnskligt lÀsbar form, samtidigt som den viktigaste informationen i kÀlldokumentet bevaras. Ett vanligt tillvÀgagÄngssÀtt för att trÀna sammanfattningsmodeller Àr att anvÀnda maximum likelihood-estimering med teacher-forcing-strategin. Trots dess popularitet har denna metod visat sig ge modeller med suboptimal prestanda vid inferens. I det hÀr arbetet undersöks hur anvÀndningen av alternativa, uppgiftsspecifika trÀningssignaler pÄverkar sammanfattningsmodellens prestanda. TvÄ nya trÀningssignaler föreslÄs och utvÀrderas som en del av detta arbete. Den första, vilket Àr en ny metrik, mÀter överlappningen mellan n-gram i sammanfattningen och den sammanfattade artikeln. Den andra anvÀnder en diskrimineringsmodell för att skilja mÀnskliga skriftliga sammanfattningar frÄn genererade pÄ ordnivÄ. Empiriska resultat visar att anvÀndandet av de nÀmnda mÀtvÀrdena som belöningar för policygradient-trÀning ger betydande prestationsvinster mÀtt med ROUGE-score, novelty score och mÀnsklig utvÀrdering

    Understanding Factual Errors in Summarization: Errors, Summarizers, Datasets, Error Detectors

    Full text link
    The propensity of abstractive summarization systems to make factual errors has been the subject of significant study, including work on models to detect factual errors and annotation of errors in current systems' outputs. However, the ever-evolving nature of summarization systems, error detectors, and annotated benchmarks make factuality evaluation a moving target; it is hard to get a clear picture of how techniques compare. In this work, we collect labeled factuality errors from across nine datasets of annotated summary outputs and stratify them in a new way, focusing on what kind of base summarization model was used. To support finer-grained analysis, we unify the labeled error types into a single taxonomy and project each of the datasets' errors into this shared labeled space. We then contrast five state-of-the-art error detection methods on this benchmark. Our findings show that benchmarks built on modern summary outputs (those from pre-trained models) show significantly different results than benchmarks using pre-Transformer models. Furthermore, no one factuality technique is superior in all settings or for all error types, suggesting that system developers should take care to choose the right system for their task at hand.Comment: 11 pages (15 with references and appendix), 4 figures, 8 Table
    corecore